在 Elixir 中建構並行系統,不僅僅是簡單地啟動流程;更需要一套嚴謹的方法論來達成 99.9999999% 的可靠性(九個九) (九個九)。這相當於大約每 30 年僅有 1 秒的停機時間。為達此標準,我們採用 五問框架。
結構性原則
在撰寫任何一行程式碼之前,請使用以下問題將有狀態的問題分解為可管理的基本單元:
- 環境與限制條件: 這是單一節點還是全域叢集?記憶體與 I/O 的限制是多少?
- 核心焦點: 資料存放於何處?誰「擁有」狀態(例如結果清單)?
- 執行時特性: 有多少併發請求?它們是受 CPU 限制還是受 I/O 限制?
- 保護策略: 哪些狀態必須存活?哪些可以容忍損失並重新啟動?
- 初始化: 如何初始化樹狀結構?哪些服務依賴於其他服務?
透過將這些問題視為限制條件,你可以避免出現『大泥球』式的並行架構——即每個流程都與其他流程直接溝通,缺乏明確的層級結構。
main.py
TERMINALbash — 80x24
> Ready. Click "Run" to execute.
>
QUESTION 1
Regarding the Five-Question Framework: What is the 'environment' and what are its constraints?
The local IDE settings and compiler versions.
The hardware/OS context, such as memory limits or if it's a distributed cluster.
✅ Correct!
Correct. This defines the physical and logical boundaries of the system.❌ Incorrect
The environment refers to the runtime infrastructure (BEAM nodes, CPU cores, IO limits).QUESTION 2
What are the 'obvious focal points' in a system?
Discrete responsibilities or modules that own specific state.
The syntax errors highlighted by the compiler.
✅ Correct!
Exactly. Focal points are structural 'anchors' for state management.❌ Incorrect
Focal points represent architectural responsibilities like results aggregation or work distribution.QUESTION 3
What does 'What do I protect from errors?' primarily help you define?
Password hashing algorithms.
Supervision boundaries and restart strategies.
✅ Correct!
Yes. It helps you decide what is precious (must survive) and what is disposable (can restart).❌ Incorrect
In OTP, protection refers to fault-tolerance and shielding state from process crashes.QUESTION 4
Which question addresses the initialization order of the supervision tree?
How do I get this thing running?
What are the runtime characteristics?
✅ Correct!
This question ensures that dependencies (like a database or a stash) are ready before workers start.❌ Incorrect
Runtime characteristics focus on load and performance, not the boot sequence.QUESTION 5
What is the 'Nine Nines' standard equivalent to?
1 minute of downtime per year.
Roughly 1 second of outage every 30 years.
✅ Correct!
This high bar is why OTP's supervision trees and fault-tolerance are so critical.❌ Incorrect
Nine Nines is 99.9999999%, which is far more stringent than standard industry 'five nines'.Case Study: The Duper Duplicate Finder
Applying the Framework to a File Auditing Tool
You are building 'Duper', an application that crawls a filesystem to find duplicate files based on their hashes. It must handle massive directories without crashing or leaking memory.
Q
1. Analyze the environment constraints for Duper. What is the primary bottleneck?
Solution:
The primary constraints are Filesystem IO and Memory. We cannot load 100,000 file paths into memory at once; thus, we need a stream-based or 'hungry consumer' approach.
The primary constraints are Filesystem IO and Memory. We cannot load 100,000 file paths into memory at once; thus, we need a stream-based or 'hungry consumer' approach.
Q
2. In Duper, what focal point is 'precious' and must be protected from worker crashes?
Solution:
The Results aggregator. While a worker hashing a corrupted file might crash, the collection of results found so far must be preserved in a supervised server that is decoupled from the workers.
The Results aggregator. While a worker hashing a corrupted file might crash, the collection of results found so far must be preserved in a supervised server that is decoupled from the workers.